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ABSTRACT 

A maximum-likelihood method is presented for estimating the power spectrum of 
anisotropies in the cosmic microwave background (CMB) from interferometer obser- 
vations. The method calculates flat band-power estimates in separate bins in ^-space, 
together with confidence intervals on the power in each bin. For multifrequency data, 
the power spectrum of the other foreground components may also be recovered. Advan- 
tage is taken of several characteristic properties of interferometer data, which together 
allow the fast calculation of the likelihood function. The method may be applied to 
single-field or mosaiced observations, and proper account can be taken of non-coplanar 
baselines. The method is illustrated by application to simulated data from the Very 
Small Array. 

Key words: cosmic microwave background - methods: data analysis - methods: 
statistical. 



1 INTRODUCTION 

Interferometers have proven themselves to be valuable 
tools in observing anisotropies in the cosmic microwave 
background (CMB). They are inherently insensitive to 
anisotropies in atmospheric emission on scales larger than 
the beam and to ground spillover, and the fact that they 
sample Fourier space directly make interferometers ideal in- 
struments for measuring the CMB power spectrum. 

A method for calculating the maximum-likelihood CMB 
power spectrum directly from the complex visibility data 
produced by an interferometer was first discussed by Hob- 
son, Lasenby & Jones (1995) (hereafter HLJ95). The method 
was illustrated by assuming the CMB anisotropies to be 
described by a Gaussian autocorrelation function, but the 
technique is easily modified so that the CMB power spec- 
trum is parameterised in terms of flat band-powers in sepa- 
rate bins in ^-space. Indeed, the modified HLJ95 algorithm 
was used to calculate flat band-power estimates of the CMB 
power spectrum in two spectral bins at £ ~ 410 and £ ~ 590 
(with bin-width A£ « 90) for the Cosmic Anisotropy Tele- 
scope (CAT), which is a 3-element interferometer (Scott et 
al. 1996; Baker et al. 1999) 

Following the early success of the CAT, a new genera- 
tion of CMB interferometers have been built, and have re- 
cently made high-sensitivity observations of the CMB. These 
experiments include the Very Small Array (VSA) (Jones 
1997; Jones & Scott 1998), the Degree Angular Scale Inter- 
ferometer (DASI) (Leitch et al. 2001; Halverson et al. 2001) 
and the Cosmic Background Imager (CBI) (Pearson et al. 



2000; Padin et al. 2001). Although the detailed design of 
these experiments is different in each case, the basic princi- 
ples underlying their operation are the same. In particular, 
these instruments have a larger number of antennas than 
the CAT (for example, the VSA has 14 horns) and more 
sensitive detectors. These specifications enable the accurate 
measurement of the CMB power spectrum over a wide range 
of angular scales. For example, the VSA in its 'compact' con- 
figuration (see section]^) measures the CMB power spectrum 
in 10 independent bins of width A£ w 90 from I « 80 - 950. 

Some discussion of how to obtain flat band-power esti- 
mates of the CMB power spectrum from this new generation 
of interferometer experiments has been presented by White 
et al. (1999a) and White et al. (1999b), mainly in connec- 
tion with the analysis of DASI data. Indeed, the techniques 
outlined by these authors have been applied to the analy- 
sis of DASI and CBI observations. In particular, the DASI 
experiment has recently produced an accurate determina- 
tion of the CMB power spectrum in 9 spectral bins of width 
Al « 80 in the range £ « 100 - 900 (Halverson et al. 2001), 
whereas the CBI has measured flat band-power estimates of 
the CMB power spectrum in two spectral bins at £ w 600 
and I ~ 1200, with a spectral resolution of Al ~ 400 (Padin 
et al. 2001). 

The increase in both the amount of visibility data and 
the number of independent spectral bins means that the 
computational burden of performing a likelihood analysis of 
the new generation of interferometer experiments is consid- 
erable. It is therefore of interest to investigate fast meth- 
ods of calculating the likelihood function for interferome- 
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ter observations of the CMB. In this paper, we extend the 
discussion given by HJL95 and White et al. (1999a, 1999b) 
and present a complete description of a maximum-likelihood 
technique for calculating flat band-power estimates of the 
CMB power spectrum from interferometer data. Following 
HJL95, the technique allows for contributions to the visibil- 
ities from Galactic foreground emission, which can be sepa- 
rated out if multifrequency data are available. In terms of the 
computational algorithm, we also present some straightfor- 
ward devices for speeding up the calculation of the likelihood 
function, which take advantage of certain useful properties 
of interferometer data. Given the facility of fast evaluation 
of the likelihood function, we then investigate the relative 
merits of obtaining the maximum-likelihood flat band-power 
estimates, and their corresponding confidence intervals, ei- 
ther by direct evaluation the likelihood distribution, or by 
using traditional numerical maximisation techniques. 

Our formalism makes no special assumptions specific 
to a particular experiment, and so can be applied to data 
from any CMB interferometer, including observations of mo- 
saiced fields, and proper account is taken of non-coplanar 
baselines. The method is illustrated by applying it to sim- 
ulated data from the Very Small Array (VSA). The ap- 
plication of the technique to real VSA observations will 
be presented in a forthcoming paper. We note, in passing, 
that a maximum-entropy map-making method for interfer- 
ometer observations, which can simultaneously deconvolve 
the interferometer beam and separate CMB and Galactic 
foreground emission, is discussed by Maisinger, Hobson & 
Lasenby (1997). 



2 INTERFEROMETER OBSERVATIONS 

Let us consider how an interferometer performs an observa- 
tion of the CMB. We begin by discussing the likely contri- 
bution to the visibility data due to emission from different 
physical components at microwave wavelengths. 



2.1 Model of the microwave sky 

The total sky intensity I(x, v) at a frequency v in a direction 
x will, in general, contain contributions from the CMB and 
several foreground components, such as Galactic free-free 
and synchrotron emission, as well as emission from extra- 



galactic radio point sources. As we discuss in section 3.1 



contamination by point source emission is usually addressed 
by making simultaneous high-resolution observations of the 
sources, which may then be used to subtract the point source 
contribution from the data. We will therefore assume that 
point emission has been subtracted in this way, so that the 
remaining contamination of the CMB signal is due to diffuse 
Galactic emission. 

We may express the fluctuations in the sky intensity as 
a sum over those due to each of these diffuse components 



AI{x,v) = ^2AI p (x, 



(1) 



where N p is the number of distinct physical components con- 
tributing to the total sky emission. For the CMB component, 



it is more usual to work in terms of equivalent thermody- 
namic temperature fluctuations AT cm b(i), which is related 
to the intensity fluctuations by 



Alcmbfx, v) 



dB{v, T) 



dT 



AT cmh (x). 



T=T 



where B(v, T) is the Planck function and To = 2.726 K is 
the mean temperature of the CMB (Mather et al. 1994). 
The conversion factor can be approximated by 



dB{u, T) 



c)T 



24.8 



sinh x/2 



Jy sr" 1 ( M K)" 



T=T 

where x = hv/kTo w (i//56.8 GHz). For simplicity we define 
the brightness temperature fluctuations for the foreground 
components in a similar way. Thus, for the pth physical com- 
ponent of emission, we define 

AT (x iA - 

We note that, in general, the 'temperature' fluctuations of 
the foreground components will be frequency dependent, un- 
like those of the CMB. 

Temperature fluctuations on the sky are usually de- 
scribed by an expansion in spherical harmonics, so for each 
component we have 



AT p {x,v) 
T 



EE 



» 



{v)Y tm {x), 



from which we define the ensemble-average power spectrum 
of the pth component at some reference frequency by 

4 rt M = <l4 p >o)| 2 ). 

Since the power spectrum of the CMB is usually quoted in 
terms of the power per unit logarithmic interval in £, it is 
also useful to define the quantities 



<Vo) = ^ + i)4 p) M. 

For observations of small patches of sky, however, the 
spherical harmonic expansion is awkward to apply, and it is 
more convenient to use Fourier analysis, so that 



AT p (x,i 
T 



a (u, v) exp(2iriu ■ x) d 2 u, 



where a^ p '(u, v) is the Fourier transform of the temperature 
fluctuations in the pth physical component at an observing 
frequency v. The reason for including the factor of 27r in the 
exponent will become clear shortly, when we consider how an 
interferometer observes the sky. The two-dimensional trans- 
form variable u is measured in wavelengths, and for later 
convenience we use p = \u\ to denote radial distances from 
the origin in the Fourier domain. For small fields, p is related 
to the multipole £ in the spherical harmonic expansion by 
p m £/2tt; the exact relationship is p = i cot(n/£) (see, for 
example, Hobson & Magueijo 1996). Since Fourier transfor- 
mation is a linear operation, we may simply sum over the 
Fourier modes for each component to obtain an expression 
analogous to ([!]) for the Fourier transform of the total sky 
fluctuations 



a(u, v) = ^ ^ a' p ' (u, v) 
p=i 
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It is convenient to separate the dependence of the 
Fourier modes on frequency v and on position in the Fourier 
domain u. We are thus assuming that the spectral index of 
any Galactic emission does is not spatially varying over the 
observed field. For the compact VSA, for example, the ex- 
tent of a single field is ~ 4.6 degrees, and so this assumption 
is not too severe. For mosaiced observations (see section 2. i) 



of larger areas of sky, however, this assumption may be ques- 
tionable. We describe the frequency variation of each compo- 
nent through the functions f p (u) (p = 1, 2, . . . , N p ). In this 
paper, we take the reference frequency vo = 30 GHz and 
normalise the frequency dependencies so that f p (ya) = 1 for 
all physical components. 

If the emission in each component is statistically 
isotropic over the observed field and no correlations exist 
between the components, then the Fourier modes have mean 
zero and their covariance properties are given by 



{a {p) {u,v)[a (v ' ) {u',v')Y) 



C {p \p,v )f p (v)f p ,(v')5 pp ,5(u- 



(2) 



where p = \u\ and C p ' (p, vq) is the ensemble-average power 
spectrum of the pth physical component at the reference fre- 
quency i/o- Since the sky is real, the Fourier modes also obey 
[a^ (u, v)]* = a ( — v). We note that the assumption 
of rotational invariance means that the ensemble-average 
power spectrum for each component is azimuthally sym- 
metric in the Fourier domain and hence a function of p. 
For I <; 60, we make contact with the spherical harmonic 
expansion by making the approximation 



(p)/ 



-2-Kp 



which is accurate to within one per cent. If we define the 
quantity D^ p \p,vo) = p 2 C^ p ' (p, fo), then in a similar way 
we find 



D (p \p,vo) 



(27T) 



2 



?=2irp 



2.2 Visibility data 

Expressing the temperature fluctuations of each component 
in terms of the Fourier modes a' p '(it, u) is particularly use- 
ful when discussing interferometer observations. Assuming 
a small field size, and ignoring instrumental noise for the 
moment, an interferometer measures samples from the com- 
plex visibility plane S(u, v) arising from the sky signal. After 
converting to temperature units, this is given by 



S(u, v) 



A(x, 



AT(a 



T 



■ exp(27riu • x) d x, 



(3) 



where x is the position relative to the phase centre, A(x, v) 
is the (power) primary beam of the antennas at the observ- 
ing frequency v (normalised to unity at its peak), and u 
is a baseline vector in units of wavelength. In the following 
discussion, we assume the primary beam of a single antenna 
is circular symmetric and so does not vary with parallactic 
angle. 

From (^), we see that S(u,v) is the Fourier transform 
of the product of the sky temperature fluctuations and the 
primary beam, which is equivalent to the convolution of the 



underlying Fourier modes a(u, v) with the Fourier transform 
of the primary beam A(u, v), 



S(u, v) = a(u, v) * A(u, v). 



(4) 



The positions in the w-plane at which this function is sam- 
pled by the interferometer are determined by the physical 
positions of its antennas and the direction of the field on the 
sky. The samples Uj lie on a series of curves (or uw-tracks), 
which we may denote by the function B(u,u) that equals 
unity where the Fourier domain (or ww-plane) is sampled and 
equals zero elsewhere. The function B(u,v) may be inverse 
Fourier transformed to give the synthesised beam B(x, v) of 
the interferometer at an observing frequency v. For a re- 
alistic interferometer, the sample values will also contain a 
contribution due to instrumental noise. Thus, at an observ- 
ing frequency u, the jth baseline Uj of an interferometer 
(measured in wavelengths) measures the complex visibility 

V(uj,u) — S(uj,u) +Af(uj,v), 

where M(uj , v) is the instrumental noise on the jth visibil- 
ity. 

The effects of the convolution of the underlying Fourier 
modes with the aperture function and subsequent sampling 
are illustrated in Fig. |l[ for the case when only CMB emis- 
sion is present (and neglecting instrumental noise). In panel 
(a), we plot the modulus-squared of the underlying Fourier 
modes for a realisation of an inflationary CDM model with 
a Hubble parameter Ho = 65 km s _1 Mpc -1 and a frac- 
tional baryon density fib = 0.05. The model is spatially flat 
with Q m = 0.3 and Q,a ~ 0.7, a primordial scalar spectral 
index n = 1, and no tensor modes. The spectrum is nor- 
malised to COBE. We note the presence of acoustic 'rings' 
in this two-dimensional power spectrum. We also see that 
the different Fourier modes are uncorrelated. In panel (b), 
we plot the same realisation but after convolution with an 
aperture function corresponding to an interferometer with 
a Gaussian primary beam of 4.6° FWHM. We see that the 
Fourier modes are no longer all independent, but appear 
correlated on lengths scales corresponding to the width of 
aperture function. The 1/e diameter of this function is plot- 
ted as the solid disc bottom left-hand corner of the figure. 
In panel (c) we overlay a typical 5-h Mv-coverage for a com- 
pact VSA observation of a single field at a declination of 
30°. This instrument consists of 14 antennas with physical 
separations lying between 20 cm and 1.50 m (see section ^). 
Each point corresponds to a sample with an integration time 
of 64 s. 



2.3 Mosaicing 

From (^), we see that in order to obtain a high resolution in 
Fourier space and thus in ^-space, the Fourier transform of 
the primary beam needs to be narrow, implying large pri- 
mary beams and small antennas. On the other hand, high 
sensitivity demands a large collecting area for each antenna. 
Clearly, some compromise has to be established between sen- 
sitivity and ^-resolution, and the current generation of CMB 
interferometers differ in their respective priorities. However, 
the problem can be circumvented by a technique called mo- 
saicing (Ekers & Rots 1979, Cornwell 1988, Cornwell et al. 
1993, Holdaway 1999). Mosaicing consists of multiple ob- 
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Figure 1. (a) A realisation of the two-dimensional CMB power 
spectrum for an inflationary CDM model (see text), (b) The same 
realisation as in (a), but convolved with the aperture function of 
our model interferometer, assuming a Gaussian primary beam of 
4.6° FWHM. (c) The nu-coverage for a simulated 5-h observation 
with a 14-element interferometer; each point corresponds to a 
sample with an integration time of 64-s. 



servations of adjacent, overlapping fields on the sky. This 
increases the effective beam size of the telescope and thus 
the ^-resolution. 

Mosaicing exploits that a telescope samples along each 
baseline a whole superposition of visibilities. This is due to 
the finite extent of the the antenna aperture, which means 
that effectively all baselines corresponding to any two points 
on the two antennas that make up the baseline are sampled. 
Observations with different pointing centres probe different 
superpositions and can thus help to disentangle the individ- 
ual contributions (Ekers & Rots 1979, Cornwell 1988). 

Unfortunately, observations of overlapping fields also re- 
quire that we take into account the correlations between the 
fields. In Appendix ^ we show that the effect of mosaicing 
may be be included by defining an effective primary beam 
A e ff(x,v) that replaces A(x,v) in ([]). 

2.4 Non-coplanar baselines and ^-corrections 

For telescopes with a large primary beam A(x, v), the two- 
dimensional approximation ([|) becomes inaccurate. Radia- 
tion incident away from the telescopes pointing centres suf- 
fers from phase errors if the w-components of the telescope 
baselines (i.e. the component in the direction of the source) 
do not vanish, or if the baselines are not coplanar. This is a 
geometric effect and is usually referred to as w-distortion or 
as non-coplanar baselines (see e.g. Cornwell & Perley 1993; 
Perley 1999). 

In order to obtain a good resolution in Fourier space, 
the VSA has a comparatively large primary beam. While 
the antennas are all mounted on the same steerable table, 
they still track the source individually. This delay-tracking 
helps to identify foregrounds or other systematic errors by 
their different fringe rates. However, in the course of a longer 
observation, the projection of the source onto the telescope 
table changes and the array is effectively no longer coplanar. 

This topic is discussed in detail in Appendix |c| where 
we show that, similarly to mosaicing, again the effects of 
the non-coplanar baselines may be included by defining an 
appropriate effective primary beam A a s(x,v) that replaces 
A(x, v) in (^j). Thus, the following discussion can be straight- 
forwardly generalised to large fields. 



3 PRELIMINARY ANALYSIS 

As discussed in HLJ95, before one can perform any mean- 
ingful statistical analysis, it is necessary to remove as many 
unwanted contributions as possible from the visibilities. It 
is also useful to compress the data sufficiently, such that the 
computational burden of the subsequent likelihood analysis 
is somewhat reduced. 



3.1 Removal of bad data and point sources 

Preliminary analysis of the time-ordered visibility data con- 
sists of the removal of periods of bad data due to poor 
weather conditions and system failures, and the subtrac- 
tion of the contribution to the sky signal from identified 
discrete radio sources above some confusion limit. For ex- 
ample, O'Sullivan et al. (1995) discuss how this preliminary 
data analysis was carried out for CMB observations with 
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the CAT interferometer, using the Ryle Telescope (Jones 
1990) to perform the radio source subtraction. For VSA ob- 
servations, the systematic removal of periods of bad data 
are performed in an analogous manner to that used for the 
CAT. The subtraction of point sources from VSA data again 
relies on Ryle Telescope observations to identify and deter- 
mine the fluxes of all sources above some flux limit in the 
observed field. This process is also assisted by the use of two 
source-subtraction antennas situated adjacent to the main 
VSA instrument. The details of the point source identifica- 
tion and subtraction techniques used for VSA observations 
is discussed in Taylor (2000) and Taylor et al. (2001). 

After periods of bad data and point source emission 
have been removed, the remaining non-cosmological contri- 
butions to the visibilities will then be from unsubtracted 
point sources below the flux limit of the observations, and 
from fluctuations in the Galactic free-free, synchrotron and 
dust emission. These foregrounds are usually identified by 
their spectral differences from the CMB, using multifre- 
quency observations. 



3.2 'Map'-making in the Fourier domain 

For the new generation of CMB interferometers, the total 
number of visibilities at each frequency is very large. For 
example, the total number of 64-s samples in a typical 5- 
h observation with a 14-element interferometer is approxi- 
mately 25,000, and any given field is observed for at least 
several days. It is therefore necessary to compress these data 
in some way, as discussed in HLJ95. This is analogous to the 
'map-making' step in the analysis of single-dish CMB exper- 
iments, in which time-ordered data is binned into pixels on 
the sky (see, for example, Borrill 1999). For an interferome- 
ter, however, the visibilities at each observing frequency are 
binned into cells in the ww-plane. 

Since we are not interested in making accurate CMB 
maps from the binned data, but are more concerned with 
estimating the CMB power spectrum correctly, the uu-plane 
is simply divided into equal-area square cells (or pixels) of 
side Au, within each of which S(u, v) is assumed to be con- 
stant. Clearly, only a small number N c of these cells will 
contain observed visibility samples. We assemble the (com- 
plex) values in each observed pixel into the (complex) vector 
s of length N c . We then calculate the maximum-likelihood 
solution for these N c values as follows. 

Suppose that the total number of visibility samples at 
the observing frequency v is N v , and that the jth visibility 
is measured at a point Uj in the uw-plane. Let us assemble 
these visibilities into the (complex) data vector v of length 
N v , and also define the N v x N c pointing matrix with ele- 
ments 



M 



jk 



1 if Uj lies in the fcth cell, 
otherwise. 



In this way, we may write the data vector as 

v = Ms + n, (5) 

where n is the (complex) noise vector of length N v , whose 
jih element is simply N(iij,v). Assuming that the instru- 
mental noise on the real and imaginary parts of the visibil- 
ities are independent and Gaussian, it is described by the 



complex multivariate Gaussian probability distribution (see 
Eaton 1983) 



Pr(n) = 



exp(— n^N 1 n 



,jv„| N 



(6) 



where N = (nn^) is the instrumental noise covariance ma- 
trix. Substituting (^) into (^|), we find that the likelihood 
of obtaining the observed visibility data given a particular 
signal vector s is given by 



Pr(v|s) 



IN 



exp[-(v - Ms) f N x (v - Ms) 



Maximising over s then yields the maximum-likelihood solu- 
tion, which we will denote by v (instead of the more usual s). 
We find that this vector of binned visibility data (of length 
N c ) is given in terms of the original visibility data vector v 
(of length N v ) by 



v = (M t N _1 M) _1 M t N _1 w. 



(7) 



Moreover, if we substitute (|J) back into (j?j) , we find that the 
vector of binned visibilities can be written as 

v = (M*N X M) 1 M t N~ 1 (Ms + n) = s + n, 

where 

n = (M t N -1 M) -1 M t N -1 n. 

This shows that the binned visibility data v is the sum of 
the pixelised signal s and some residual pixelised noise n 
with covariance matrix 



N = (nn f ) = (M t N~ 1 M) 



(8) 



It is instructive to consider the special case in which the 
original instrumental noise covariance matrix N is diagonal, 
so that the instrumental noise on each original visibility is 
uncorrelated. In fact, this is usually the case for interferom- 
eter data, and we may write N = diag(ai, . . . , a% ), where 
is the variance of the instrumental noise on the j'th origi- 
nal visibility. In this case, it is straightforward to show that 
the maximum-likelihood solution (bj) reduces to simple cell- 
averaging. Thus, the value Vk of the binned visibility in the 
fcth cell is given by 



Vk = 



where m is the number of original visibilities lying in that 
cell. Similarly, from we find that the residual pixelisa- 
tion noise has the covariance matrix N 
where 

2 1 



diag(e?, . 



,2 

! £ JV C 



At each observing frequency v, it is the binned visi- 
bility vector v and pixelised noise vector n that constitute 
the basic data which are analysed in the estimation of the 
CMB power spectrum. It is usual to associate the fcth binned 
visibility Vk and noise n*, with the the position uj, that cor- 
responds to the centre of the fcth cell in the uu-plane. This 
is, however, not necessary. Indeed, one may preserve more 
information regarding the distribution of samples in the uv- 
plane by instead choosing Uk to be the 'centre-of-mass' of 
the positions of the visibility samples within the fcth cell. 

It only remains to choose the size of the cells Au in the 
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MM-plane. Since the speed with which the likelihood func- 
tion can be calculated is strongly dependent on the number 
of binned visibility points that are analysed, we must make 
a compromise between how accurately the binned data rep- 
resent the original visibilities, and the desire to use fewer 
data points in order to speed up the calculation. In fact, 
two natural limits exist for Am. 

A minimum size for the uv cells is given by the require- 
ment that each binned point on the grid represents an inde- 
pendent estimate of the underlying visibility. If the typical 
sample time per visibility is r sec, then for samples at a dis- 
tance from the MW-origin of |u|, time-smearing will correlate 
visibilities separated along a w-track by less than approxi- 
mately 

Am « 2n\u\ 



1 24 x 60 x 60 



For t = 64 sec and MM-coverage which extends out to \u\ ~ 
200 wavelengths, then Am <; 1 wavelengths. 

A maximum size for the uv cells is derived by consider- 
ing the fact the measured visibilities are obtained by a convo- 
lution of the underlying Fourier modes with the interferom- 
eter aperture function. For example, the VSA in its compact 
configuration, this function is a Gaussian a FWHM of ap- 
proximately 12 wavelength. Hence, as discussed by HLJ95, 
from the sampling theorem we require Am ?S 6 wavelengths. 
Nevertheless, finer binning is still useful since it provides 
better approximation for the position of the visibilities in 
the Mt)-plane. Thus, we find that the uv cell size is reason- 
ably well determined by our two limiting cases and for the 
simulations considered here we use Am = 3 wavelengths. 

As an illustration, in Fig. |^ we plot the positions of 
the binned visibilities corresponding to the samples shown 
in Fig. |l](c) . Each dot shows the position Uk of the 'centre- 
of-mass' of each cell, with which the binned visibility Vk 
is associated. In this case, the number of non-zero binned 
visibilities is approximately 2500. Provided the geometry of 
the interferometer is not varied, each period of observation 
of a given field will produce visibility samples lying in the 
same set of cells in the MM-plane. Thus, after binning the total 
number of complex data points at each observing frequency 
that are used to estimate the CMB power spectrum is about 
2500. 



4 THE VISIBILITIES COVARIANCE MATRIX 

After binning, the data consist of the binned visibility data 
vector v(vi) (i = l,...,Nf) at each of the Nf observing 
frequencies, together with the corresponding noise vector 
n{vi) at each frequency. To keep our notation simple, we 
combine all the binned visibilities into the single data vector 
v, and all the noise vectors into a single vector n. 

Let us assume that the jth binned visibility is measured 
at a point Uj in the MW-plane and at an observing frequency 
Vj . From this point onwards, we shall denote the jth binned 
visibility by Vj. Similarly, we shall denote the noise on the 
jth binned visibility by Mj and the contribution from the 
sky signal by Sj. With a view to a practical implementation, 
it is easier to pursue the analysis in terms of the real and 
imaginary parts of each visibility, which we write as 

V, = Vf ] +iVf\ 



u (wavelengths) 



Figure 2. An illustration of the positions of the binned visibilities 
resulting from the samples shown in Fig. |I](c) using a cell size 
Am = 3 wavelengths. 



We may also divide the signal and noise contributions S and 
N into their real and imaginary parts in the same way. Thus, 
if the interferometer observation (including all observing fre- 
quencies) consists of total number Nt of complex binned vis- 
ibilities, we adopt the convention that the data vector v is 
of length Nd = 2 At, consisting of first the real parts and 
then the imaginary parts of the visibilities, i.e. 



V 



(R) 



VNt+j 



V 



(I) 



It is also useful to write v = s + n, where the signal vector 
s and noise vector n are defined in a similar way. 

Assuming there are no correlations between the contri- 
butions to visibilities from the sky signal and the instrumen- 
tal noise, the covariance matrix of the interferometer data 
is given by 

C = (W) = (ss') + (nn l ) = S + N, 

where S and N are respectively the covariance matrices of 
the contributions from the sky signal and noise. To an excel- 
lent approximation, N make be taken as diagonal, whereas 
the signal covariance matrix S has the structure 



S = 



g(R,R) 









(9) 



where S, 



(R,R) 



= (S™S™ > etc. 



j - b j ' I — I 

As shown in Appendix [<\l for an observation of a single 

field in the flat-sky approximation, the elements of S may 
be conveniently written in terms of the complex correlators 



A(ui — u, Vi)A*(uj — u, Vj)£(\u\, Vi, Vj) d u, 



A(m — u, vi)A{uj +u,Vj)£(\u\,Vi,Vj) d u, 



where A(u, v) is the aperture function of the interferome- 
ter horns and ^{p,Vi,Vj) is the generalised power spectrum, 
which is defined by 
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N p 

£ 

P =i 



(10) 



We note that, if the primary beam of the interferometer 
horns is is symmetric with respect to inversion through the 
origin, i.e. A(x,v) — A(—x,u), then the aperture function 
A(x,v) is real. In this case, S^ R ' 1 ' 1 and S' ' R ' vanish, and 
so S in ([]) is block-diagonal. 

By introducing polar coordinates (p, 6) into the uv- 
plane, such that u = p(cos 8, sin #)*, we can also write (SiSj) 
and (SiSj) formally as the one-dimensional integrals 



(11) 



(12) 



where the window functions (p) and W-?^ (p) are defined 
respectively by 



A9 A(ui — u, Vi)A*(i 



d8 A(ui — u, Vi)A(iij + u, Vj). 



The advantage of writing (SiSj) and (SiSj) in this way is 
that it provides a natural separation between the influence 
on S of instrumental effects and the underlying power spec- 
trum of the sky. In particular, for an interferometer with a 
given geometry, the window functions Wy (p) and Wy (p) 
are fixed, independent of the underlying sky power spec- 
trum. Thus, for a given array geometry, these quantities need 
only be calculated once. 

As discussed in Appendices ^ and ^, if one performs 
mosaiced observations or includes the effects of non-coplanar 
baselines, the correlators (SiSj) and {SiSj) may still be 
written in the above forms, but with A(u, v) replaced by 
an appropriate effective aperture function Am(u, v). The 
relevant expressions for Aes are given in (B5) and (C5). 
In general, A e g is not a real function, and so the signal 
covariance matrix (^|) is not, in general, block-diagonal. 

4.1 Sparse structure of the covariance matrix 

The expressions ([H]) and ( |l2| ) provide general formulae for 
calculating the element Sij of the signal covariance matrix. 
So far we have assumed that all these elements must be cal- 
culated. If, for example, the number of binned complex visi- 
bilities is about 2500, the real data vector v has about 5000 
elements. Thus, in general, the construction of the (symmet- 
ric) covariance matrix S requires the evaluation of approx- 
imately (5000 x 5000) /2 elements for any given generalised 
power spectrum £(p,Vi,Vj). It is, however, straightforward 
to show that the signal covariance matrix S is in fact very 
sparse and hence the computational requirements are some- 
what reduced. 

The reason for the sparse nature of the signal covariance 
matrix is illustrated in Fig. |^, and is peculiar to observations 
of the CMB made with interferometers. For simplicity let 
us consider a single-field observation with no non-coplanar 
baselines, and focus particularly on two measured visibil- 
ities at positions Ui and Uj respectively in the u-u-planc. 




Figure 3. An illustration of the sparse nature of the signal co- 
variance matrix S for interferometer observations. 



From (^), these visibilities are obtained simply by convolv- 
ing the underlying independent Fourier modes by the aper- 
ture function of the interferometer, which is illustrated by 
the solid shaded circles in the figure. 

Since the extent of the antenna function is determined 
simply by the physical size of the interferometer horns, it is 
identically zero outside some critical radius R c . Hence one 
would expect that, if the separation \ui — Uj\ of the two visi- 
bility samples were greater than 2R C , then the corresponding 
element Sij of the signal covariance matrix should be identi- 
cally zero. An additional subtlety remains, however, because 
the sky is real, and so a{— u, v) = a*(u, v). Consequently, 
correlations exist between areas of the Fourier domain that 
lie on diametrically opposite sides of the origin of the uv- 
plane; this is illustrated in Fig. |^. Thus, the element Sij of 
the signal covariance matrix will be identically zero provided 
Ui and Uj satisfy both the conditions 



\Ui — Uj 
\Ui + Uj 



> 



2R C , 
2R C . 



this occurs when no two shaded 



With reference to Fig. 
discs overlap. 

We may obtain a simple estimate for the sparsity frac- 
tion of the matrix S as follows. Suppose the interferometer 
samples the «u-plane with roughly uniform two-dimensional 
coverage in the range p m in to p m ax- Thus, given the position 
of the sample it;, a second sample Uj may lie anywhere 
in region between the two concentric circles p = p m m and 
p = Pmax shown in Fig. which is of area 7r(p^ ax - p^in)- 
In order for Sij to be non-zero, however, Uj must lie within 
a distance 2R C of the point Ui or —Ui, which corresponds 
to an available area 2ir(2R c ) 2 of the wv-plane. Thus, an es- 
timate of the sparsity fraction f s is given by the ratio of the 
areas, which reads 



8R„ 



n 2 

pmax 



For example, the VSA in compact configuration has R c ~ 
8, pi ~ 20 and p2 « 150 wavelengths and so the sparsity 
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fraction in this case is f s fti 0.02, i.e. only about 2 per cent 
of the elements are non-zero. 

In fact, the fraction of elements Sij that need be cal- 
culated in practice is somewhat smaller than f s . As stated 
above, the element Sij is identically zero if Ui and Uj sat- 
isfy the constraints, with R c equalling the full extent of the 
aperture function. For most interferometers, however, the 
aperture function falls close to zero some way before R c . 
Thus, to a very good approximation, Sij ~ if 

lui — «,-| > 2aR c , 



\ui + Uj\ > 2aR c 



(13) 
(14) 



where a may be taken to be somewhat less than unity. The 
resulting sparsity fraction f s is then lowered by a factor a 2 . 

In particular, as we describe below, many interferome- 
ters have an aperture function that are well described by a 
(truncated) two-dimensional Gaussian. In this case, we find 
empirically that an excellent approximation to S is obtained 
by assuming to be zero all those elements Sij for which ( |l3| ) 
and (Q are satisfied, where a is chosen so that aR c corre- 
sponds to the radius at which the Gaussian aperture func- 
tion drops to 1/e of its peak central value. This leaves only 
a very small percentage of matrix elements that must be 
calculated by evaluating the integrals (0) and ©. On us- 
ing this approximation for S in the subsequent maximum- 
likelihood estimation of the CMB power spectrum, discussed 
later in section |(| we find that the flat band-power estimates 
obtained in each £-bin alter by less than 1 per cent, as com- 
pared to the case in which we set a — 1. 



4.2 The flat band-power parameterisation 

It has become standard practice to parameterise the CMB 
power spectrum in terms of flat band-powers as follows. The 
range of p — \u\ over which we wish to estimate the power 
spectra of each component is divided into Nb separate bins. 
These bins correspond to separate annuli in the uu-plane, 
centred on the origin. One then assumes that the quantity 

for each physical component (p = 1,2,..., N p ) is constant 
within any given spectral bin. We denote the constant value 
of D^(p, va) in the 6th bin by D b v \ and these quantities 
are the parameters whose values are to be determined from 
the likelihood analysis described in section [E|. Clearly, the 
total number of parameters is NbNp. 

This parameterisation of the component power spectra 
has the advantage that, once the elements of signal covari- 
ance matrix have been calculated for one set of band-power 
values D b , it may be evaluated for any other set at almost 
no extra computational cost. This is easily demonstrated by 
first introducing the top-hat function T(a,b), which equals 
unity in the range a < p < b and is zero elsewhere. We may 
then write D^ p ' (p, uo) in terms of our model parameters D b P 



D (p) (p,^o) = ^Dl p) r( Pi) ,p 6+1 ), 

6=1 

where p b the lower limit of the bth bin. Therefore, the gen- 
eralised power spectrum £(p,Vi,Vj) in ( p^|) is now given by 



t(p,Vi,Vj) = ~2 / ] fpMfp / ] D b T{p b ,p h+1 ). 

p=l b=l 

Hence the expressions ( [tl| ) and ( |l2| ) can be written as 

N p N b 

(SjS*) = Y,^^Y^^ K if^ ph +^ (15) 

p=l 6=1 

N p N b 

(SiSj) = ^/,(^)/ p (^)^4 P) < 2) (P6,^ + i), (16) 

p=l 6=1 

(17) 



where the sets of integrals (p b , pt+i) (for 
given by 



K \t' \pb, Pb+i) 



Pb+i 



dp, 



1,2) are 



(18) 



As stated earlier, for a given array geometry, the window 
functions w£\p) (r = 1,2) are fixed. Thus the integrals 

JQj (p6,p6+i) need only be evaluated once, at the start of 

— (p) 

the calculation. Then for any values of the parameters D b , 
we may use ( |l5| ) and jl6| ) to evaluate the elements of the 
signal covariance matrix S. 

We may choose the width of the bins arbitrarily, but 
an obvious choice is the characteristic width of the aper- 
ture function of the interferometer (for example, the 1/e 
width). Since this width is the typical correlation length 
in the convolved Fourier domain, the errors on the derived 
values of the flat band-powers in different bins will be quasi- 
uncorrelated. The compact VSA, for example, samples the 
range p w 20-150 wavelengths and the aperture function for 
a single field observation has a 1/e width of about 15 wave- 
lengths. Thus, the total number of spectral bins is N b ~ 10. 



4.3 Gaussian primary beam 

So far we have not assumed a specific form for the primary 
beam A(x,u) of the interferometer horns. Nevertheless, for 
many interferometers, the primary beam may be modelled 
to a good approximation as a two-dimensional Gaussian. 
Thus, the primary beam at the ith observing frequency Vi 
by 

A(x,Ui) = exp(~\x\ 2 /2a 2 ), 

where o~i is the dispersion of the primary beam at this fre- 
quency. The aperture function is simply the Fourier trans- 
form of the primary beam, and is given by 



-2tt 2 ct 2 \u\ 2 ) 



A(u,Vi) = 27TCT; exp( 

As we show in Appendix |d| this assumption allows us 
to obtain analytical expressions for the window functions 
Wjp(p) (r = 1,2), even for mosaiced observations and in- 
cluding non-coplanar baselines. Thus the integrals K^\p) 
in ( |l8| ) can be evaluated quickly and straightforwardly us- 
ing standard numerical quadrature techniques. For a com- 
pact VSA observation of a single field, the calculation of the 
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integrals corresponding to non-zero elements of the signal 
covariance matrix S require about 2 mins of CPU timer] 



5 LIKELIHOOD ANALYSIS 



As discussed in the start of section 3.2, the basic data con- 



sist of the (real) binned visibility data vector v of length 
Nj,, and the associated (real) residuals covariance matrix 
N = (nn 1 ). Assuming that the underlying Fourier modes 
of the sky and the instrumental noise are both Gaussian- 
distributed with zero mean, the likelihood of obtaining the 
data, given some particular set of model parameter values 
a, is 



C(y\a) 



(2n) N d/2\C(a)\ 1 /- 



■exp[— hv t C 1 (a)v] 



where C(a) = S(a) + N is the sum of the signal and noise 
covariance matrices. In our case, the vector of model param- 
eters a contains the Nb x N p flat band-powers D b in each 
spectral bin for each physical component of sky emission and 
the structure of the corresponding signal covariance matrix 
S(a) is discussed in the previous section. In fact, in order 
to avoid numerical instabilities, we adopt the standard tech- 
nique of working instead with the log-likelihood function 

\n£(v\a) = constant - § [ln|C(o)| + v t C~ 1 (a)v] . (19) 

The aim of any likelihood analysis is to find the param- 
eter values a that maximise the log-likelihood function, and 
also obtain an estimate of the accuracy with which these pa- 
rameters have been determined. There exist several different 
strategies for achieving these goals, which we now discuss, 
with particular emphasis on the computational cost of each 
approach when applied to interferometer data. 



5.1 Evaluation of the multi-dimensional likelihood 

Ideally, one would like to evaluate the full (log-)likelihood 
function over some hypercube in the space of parameters 
a. In this way, the location of the (global) maximum is ob- 
tained immediately, and the presence of multiple subsidiary 
maxima is readily observed. Also, if one is interested only in 
(say) the CMB power spectrum, one can integrate out (or 
marginalise) the parameters describing the power spectrum 
of Galactic emission in a straightforward manner. Moreover, 
marginalised distributions for individual parameters can be 
calculated trivially, in order to obtain confidence limits. 

Unfortunately, evaluation of the full likelihood function 
on a hypercube in parameter space is numerically unfeasible 
when the number of parameters is large. In our case, the pa- 
rameter space has N),N P dimensions. Thus, if the likelihood 
function were evaluated at M points along each parameter 
axis, the total number of evaluations of the log-likelihood 
function @ is M NbNp . Even if one assumes that the CMB 
is the only physical component of emission (i.e. N p — 1), 
the dimensionality of the parameter space is equal to the 
number of spectral bins in which one wishes to estimate the 



* Throughout this paper CPU times refer to computations per- 
formed on a single Intel Pentium III 1 GHz processor. 



CMB power spectrum. For the latest generation of CMB in- 
terferometers JVb ~ 10, and so the parameter space has a 
high dimensionality. If, in this case, the likelihood function 
were calculated at only M — 10 points on each 'axis', this 
requires 10 10 evaluations of the function (|li|). 

Nevertheless, we note in passing that, with the ad- 
vent of faster computers and efficient algorithms, it has re- 
cently become possible to sample directly from a multidi- 
mensional likelihood function using Markov-Chain Monte- 
Carlo (MCMC) techniques (see e.g. Christensen et al. 2001). 
Even in a space of large dimensionality, these algorithms al- 
low one to construct accurate one-dimensional marginalised 
distributions for each parameter, using relatively few evalu- 
ations of the likelihood function (typically ~ few x 1000). 
The application of MCMC techniques to the estimation of 
the CMB power spectrum from interferometer data will be 
presented in a forthcoming paper. 



5.1.1 Evaluation of the log-likelihood function 

Despite the difficulties associated with the direct calculation 
of the full likelihood over a hypercube in parameter space, 
it is necessary for our later discussion to consider the com- 
putational task associated with evaluating the log-likelihood 
function (^) for a given set of parameter values a. 

First, one must calculate the corresponding covariance 
matrix C(a) = S(a) + N . Since N is constant, one needs 
only to calculate the signal covariance matrix S(a). As dis- 
cussed in section 4.2, after evaluating the integrals (b_8|) once, 
at the start of the calculation, the signal covariance matrix 
S(a), for a given set of parameter values a, may be cal- 
culated very quickly using © and @. Since, for most 
interferometers, N is diagonal to a very accurate approxi- 
mation, C(a) inherits the sparse structure of S(a) described 
in section 4.1. 



Once C has been calculated, one proceeds to evaluate 
the log- likelihood function (^), which requires the evalua- 
tion of the quadratic form and the determinant CI. 
The standard procedure is first to perform the Cholesky de- 
composition 



(20) 



where L is a lower triangular matrix. The determinant |C| 
is then given by 

\C\ = \LL*\ = \L\ 2 , 

where, since L is lower triangular, \L\ is simply the product 
of its diagonal elements. To evaluate the quadratic form, one 
then solves for x the lower triangular system 



Lx — v, 



(21) 



which is computationally straightforward. Finally, one forms 
the scalar product of x with itself to obtain 

x l x — v* (L^ 1 ) t L~ v = v (LL t )~ 1 v = v t C~ 1 v. 

By far the most computationally intensive step in the 
calculation is the Cholesky decomposition (^o|). For a com- 
pact VSA observation of a single field, for example, the (real) 
binned visibility data vector v contains about Nd — 5000 
elements, so that C has dimensions 5000 x 5000. Using a 
standard LAPACK subroutine (Anderson et al. 1999) to per- 
form the decomposition requires about 5 mins of CPU time. 
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The subsequent solution of the lower-triangular system (|21j) 
is much faster, requiring only 0.25 sees of CPU time. Hence, 
the total time required to evaluate the log-likelihood func- 
tion @ for a given set of parameters a is around 5 mins. 
We therefore see that the evaluation of the full likelihood 
function over some hypercube in parameter space is compu- 
tationally unfeasible. 

We note, in passing, that the LAPACK library uses the 
highly-optimised Basic Linear Algebra Subroutines (blas) 
for simple operations. In practice, this combination pro- 
vides the fastest readily-available dense matrix computa- 
tional subroutines. As a comparison, performing the same 
calculation of the log-likelihood using simple unoptimised 
routines, such as those presented in Press et al. (1994), re- 
quired approximately 20 times the CPU time on an equiva- 
lent processor. 



5.1.2 Sparse matrix conjugate-gradient algorithm 

As stated earlier, the covariance matrix C is very sparse for 
interferometer data. Therefore, using standard dense matrix 
subroutines, such as those in the LAPACK library, is wasteful 
both computationally and in terms of memory usage. One 
should instead employ sparse matrix algorithms, which take 
advantage of the sparse structure of C and require only the 
storage of its non-zero elements (plus an integer array to 
store their positions in the matrix). It was found that the 
most computationally efficient approach was to solve the 
linear system 



Cx = v, 



(22) 



with a preconditioned conjugate-gradient algorithm that 
performed its internal matrix and vector operations using 
sparse matrix routines. In particular, the preconditioner was 
chosen to be the sparse incomplete Cholesky decomposition 
of C. This matrix has the form 



C = LDL\ 



(23) 



where D is a diagonal matrix, and L is lower triangular 
with unit diagonal elements and which has the same sparse 
structure as the lower triangle of C (which is symmetric). 
Since one imposes this sparse structure on L, the matrix 
C is only an approximation to C (see Greenbaum 1997). 
Nevertheless, the approximation is sufficiently accurate to 
provide an extremely effective preconditioner, which allows 
the conjugate-gradient algorithm to converge to the solution 
a; in a just a few iterations. Moreover, it was found that the 
preconditioner C was a sufficiently accurate approximation 
to C that the determinant 



\C\ = \LDL t \ = ILII-DIIL* 



\D\ 



differed by less than 0.01 per cent from the true determinant 
C|. Since D is diagonal, its determinant is trivially eval- 
uated. Thus, the sparse matrix preconditioned conjugate- 
gradient algorithm provides an efficient method of calculat- 
ing the complete log-likelihood function (pj|). 

As in the dense matrix approach, the most computa- 
tionally demanding step of the calculation is the construc- 
tion of the incomplete Cholesky preconditioner (^jj). As an 
example, for the 5000 x 5000 covariance corresponding to a 
compact VSA observation of a single field, the calculation 



requires about 30 sec of CPU time. The subsequent pre- 
conditioned conjugate- gradient solution of the linear system 
( p2| ) then requires only about 0.5 sec of CPU time. Thus, 
we see that, in this case, the evaluation of the log-likelihood 
function by this method is approximately 10 times faster 
than the standard dense matrix approach using LAPACK sub- 
routines. For larger problems, the LAPACK routines scale as 
O(iVj), where Nd is the size of the data vector, whereas the 
sparse matrix algorithm scales as 0(fs^ 2 Nd), where f s is the 
sparsity fraction of the covariance matrix C. Unfortunately, 
the speed-up provided by the sparse matrix approach is still 
insufficient to enable the calculation of the full likelihood 
distribution over some hypercube in parameter space. 

5.2 Numerical maximisation of the likelihood 

When it is unfeasible to calculate the full likelihood func- 
tion directly over the parameter space, one usually resorts 
to numerical maximisation. The main disadvantage of this 
approach is that most standard methods will converge to the 
nearest local maximum, which need not be the global maxi- 
mum. One would hope, however, that the likelihood function 
is sufficiently structureless that this is not a problem for a 
reasonable starting guess for the CMB power spectrum. 

Standard numerical maximisation (minimisation) algo- 
rithms use differing amounts of gradient and/or curvature 
information in order to arrive at the maximum point. For 
example, methods such as the downhill simplex or Powell's 
direction-set algorithm use no gradient information, requir- 
ing only function evaluations (see Press et al. 1994). Other 
methods, such as the variable metric technique, use only gra- 
dient information, whereas the standard Newton-Raphson 
algorithm requires both gradient and curvature informa- 
tion. In particular, Newton-Raphson technique and Pow- 
ell's direction-set algorithm were investigated as methods of 
numerically maximising the likelihood function. 

5.2.1 The Newton-Raphson method 

The Newton-Raphson method (or variations thereon) is 
the standard numerical minimisation technique used in 
maximum-likelihood estimation of the CMB power spec- 
trum (e.g. Borrill 1999). Starting from some initial guess 
ao for the flat band-powers, one performs the iteration 

a n+ i = a„ + Sa n , (24) 

where the correction 5a n is given by 

5a n = — [H 1 g] a= a n , 

in which g — V(ln£) is the gradient vector and H = 
VV(ln£) is the curvature (or Hessian) matrix. The itera- 
tion (|5|) is continued until the algorithm has converged to 
the required accuracy. The elements of the gradient vector 
and curvature matrix of the log-likelihood function ( Jis| ) are 
easily shown to be given respectively by 



d\n£ 

d 2 ln£ 

da k da' 



v*C- 



L dS | 

da k 



TilC 



da k 



dS 



da k 
_i OS 



C~ L v 



-i dS 

da k 



\ da k day 



(25) 



(26) 
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where Tr( ) denotes the trace of a matrix. 

The implementation of the Newton-Raphson algorithm 
can be divided neatly into a series of computational steps 
(see Borrill 1999). For the analysis of interferometer data, 
each iteration of the algorithm consists of the following 
steps. 

(i) Calculation of the signal covariance derivative matri- 
ces dS /da k (k = 1, . . . , TV), where TV is the total number of 
parameters. In the flat band-power parameterisation of the 
power spectra, we see from (^HJ) and ([l6]) that S is linear in 
the parameters a. Thus, the derivative matrices are simply 
appropriate linear combinations of the integrals in 
which are evaluated once at the start of the calculation. 

(ii) Construction of the (sparse) covariance matrix 



_ _ \ ^ OS 
C = N+} a k — 
£-^t da i. 



da k ' 
fc=i 

calculation of its (incomplete) Cholesky decomposition and 
solution of the linear system Cx = v. 

(iii) Solution of the linear systems 

OS 

da k 

where Y k is a matrix of the same dimensions as C. Each 
column of Y k is solved for using the (incomplete) Cholesky 
decomposition obtained in step (ii). 

(iv) Assemble the elements of the gradient vector 

dlnC 



CY k 



(k = l,...,N), 



9k 



0, 



a l 



= | [v'YkX - Tr{Y k )] 



H k y 



v) Assemble the elements of the curvature matrix 
d 2 ln£ 



-v*Y k Y v x + ±Tr(YkY k 



** -■- K -■• K' ' O IS-* k ' ) • 

oa k aa k i 

(vi) Calculate the correction vector 5a — —H~ 1 g. 

Although it is not necessary for the Newton-Raphson 
algorithm, we note that in step (ii) one performs all the nec- 
essary calculations to evaluate the log-likelihood function 
In C itself. This can be useful for checking that its value is, in 
fact, decreasing with each iteration. Typically, the Newton- 
Raphson algorithm converges to the maximum-likelihood so- 
lution in approximately 5 iterations. 

The computationally most demanding calculations in 
each iteration are steps (ii) and (iii) above, which may be 
performed using either the dense or sparse matrix tech- 
niques disc ussed in the previous section. As mentioned in 
for a visibility data vector v of length 5000, 



5.1 



section 

the calculation in step (ii) of the (incomplete) Cholesky de- 
composition requires approximately 5 min of CPU time for 
the dense matrix LAPACK routines and about 30 sec for the 
sparse matrix algorithm. The subsequent solution of the lin- 
ear system Cx — v then requires about 0.25 sec or 0.5 
sec respectively. Thus, it would clearly be faster to use the 
sparse matrix technique in step (ii). 

In step (iii), however, one must solve a large number 
of linear systems, namely one for each column of Y k for 
k — l,...,N, using the (incomplete) Cholesky decomposi- 
tion calculated in step (ii). Even assuming the sky emission 
is due only to the CMB, the total number of linear systems 
is NdNb, where is the length of the data vector v and 
Nb is the number of spectral bins in which one wishes to 



calculate the CMB power spectrum. For a compact VSA 
observation of a single field, Nd ~ 5000 and TV, ~ 10, and 
so one must solve ~ 50000 linear systems. Solving each sys- 
tem requires around 0.5 sec of CPU time using the sparse 
conjugate- gradient algorithm, but only 0.25 sees using the 
LAPACK dense matrix routines. Taking step (ii) and step (iii) 
together, it is therefore faster to use the LAPACK routines, 
which require a total of about 3.5 hrs CPU time to com- 
plete these steps. The remaining steps (iv) - (vi) require 
only around 0.5 hr of CPU time, so that a full Newton- 
Raphson iteration takes around 4 hrs. Since 5 iterations 
of the Newton-Raphson algorithm are typically required 
for convergence, the total CPU to calculate the maximum- 
likelihood solution is around 20 hrs. 

We note that Bond, Jaffe & Knox (1998, 2000) propose 
a variant of the Newton-Raphson algorithm, in which steps 
(v) and (vi) are different. In particular, the correction vector 
in step (vi) is replaced by 

5a = —F~ 1 g, 

where the Fisher matrix F is the expectation value of the 
curvature matrix. From (|26|), the Fisher matrix has the ele- 
ments 

F kk , = (H W ) = |Tr (c-'^-C- 1 ^- 
V oa k day 

Thus, step (v) above is replaced by 



F kk , = \Tt{Y k Yy). 



(27) 



Comparing the expressions for H kk i and F kk i we see that 
some saving is made by using the latter. From (p7[), how- 
ever, we note that the calculation of the Fisher matrix still 
requires knowledge of the matrices Y k (k — 1, . . . , TV) cal- 
culated in step (iii), which is the computationally most de- 
manding part of the calculation. Hence, the use of the Fisher 
matrix, rather than the full curvature matrix, leads to a re- 
duction of only around 20 mins in the CPU time required 
for each Newton-Raphson iteration. 

5.2.2 Powell's direction- set method 

Clearly, the Newton-Raphson algorithm described above is 
computationally intensive. Moreover, most time is spent is 
computing quantities necessary for calculating the gradient 
and/or cur vatur e of the log-likelihood function. As discussed 
in section 5.1.2, however, the log-likelihood function itself 



can be calculated using the sparse preconditioned conjugate- 
gradient method in about one-tenth of the CPU time re- 
quired by the standard LAPACK dense matrix routines. These 
observations suggest that it might be more computationally 
efficient to maximise the log-likelihood function using an 
algorithm that requires only function calls, rather than gra- 
dient and curvature information. 

A particularly efficient and robust estimator of this type 
is Powell's direction-set method, which is described in detail 
by Press et al. (1994). At each iteration, the algorithm de- 
termines a set of 'conjugate' directions at the corresponding 
point a in the parameter space, and minimises the function 
along each direction in turn. The method can be shown to 
be quadratically convergent, so that an exact quadratic form 
would be minimised in TV(TV + 1) line-minimisation, where 
TV is the dimensionality of the parameter space. Typically, 
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each line-minimisation requires 2 or 3 function calls. Thus, 
one would expect between 2N(N + 1) and 3N(N + 1) func- 
tions calls would be required to minimise an exact quadratic 
form. 

Since the algorithm requires only function evaluations, 
one may straightforwardly e mploy the sparse matrix tech- 
nique discussed in section 5.1.2] for evaluating the log- 



likelihood function. Assuming the sky emission to be due 
solely to the CMB, the dimensionality of the parameter 
space is simply the number of spectral bins in which one 
wishes to estimate the CMB power spectrum, i.e. N = Nb- 
For a compact VSA observation of a single field, Nb w 10 and 
it was found that the algorithm converged to the maximum- 
likelihood solution after m 300 evaluations of the likelihood 
function. This is in broad agreement with the anticipated 
number of function evaluations quoted above, although the 
presence of the term ln|C| in the log-likelihood function 
([li^) means that it is not a simple quadratic form. Thus, 
since each function evaluation requires 30 sec of CPU time, 
the maximum-likelihood CMB power spectrum may be ob- 
tained in about 2.5 hrs of CPU time, which is considerably 
faster than the Newton-Raphson algorithm. 

We note, in passing, that the downhill-simplex minimi- 
sation algorithm (see Press et al. 1994) was also investi- 
gated. This again uses only function calls, but it was found 
to converge too slowly. Nevertheless, a robust technique for 
obtaining a accurate minimum in a reasonable amount of 
CPU time is first to employ Powell's direction-set algorithm 
until it converges to some point a in parameter space, and 
then to restart the minimisation process from this point us- 
ing a downhill simplex. The final simplex step is then very 
fast and provides a robust estimate of the minimum point. 
This algorithm is implemented as the program maxlike 
in the Microwave Anisotropy Dataset Co mputational sOft- 
Ware (madcow) package (see section 5.4). 



5.2.3 Estimation of errors on parameter values 

A major disadvantage of numerically maximising the likeli- 
hood is that one remains ignorant of the global shape of the 
distribution. Therefore, one cannot calculate marginalised 
distributions for each parameter, in order to obtain proper 
confidence limits. Nevertheless, one would hope that the 
shape of the likelihood function C near its peak a should 
be well-approximated by a Gaussian, so that 



In C(a + 8a) — In C(a) + \8a l H(a)Sa, 



(28) 



where H is the curvature (or Hessian) matrix given in (|26|), 
and the gradient term in the Taylor expansion vanishes 
since & is a maximum. Exponentiating J28|), we immedi- 
ately recognise that covariance matrix of our errors is given 
simply by the inverse of the curvature matrix at the peak, 

(SaSa*) = JT _1 (a). 

The curvature matrix H(a) may be calculated directly 
by performing steps (ii), (iii) and (v) in the Newton- 
Raphson algorithm at the maximum point a. From sec- 
tion 5.2.1 , we note that, for a compact VSA observation 



numerically by performing second differences along each pa- 
rameter direction. An appropriate step size is easily chosen, 
and the algorithm requires ^N(N + 3) + 1 function eval- 
uations. Thus, for 10 spectral bins, one must evaluate the 
log-likelihood function around 70 times. Using the sparse 
preconditioned conjugate- gradient algorithm, this requires 
only about 30 mins of CPU time. Comparing the resulting 
curvature matrix with that obtained directly from (^), we 
find that each element is agrees to within 0.1 per cent. The 
numerical calculation of the curvature matrix and its inver- 
sion to produce the errors covariance matrix is implemented 
in the routine errors in the MADCOW package. 

5.3 Evaluation of the likelihood function along 
parameter axes 

The Powell direction-set minimiser, together with the sparse 
preconditioned conjugate-gradient algorithm for evaluating 
the log-likelihood function, provides a method of obtaining 
the maximum-likelihood power spectrum and the errors co- 
variance matrix in a reasonable amount of CPU time. Never- 
theless, by taking full advantage of the sparse nature of the 
visibilities covariance matrix C and of the flat band-power 
parameterisation of the power spectrum, it is possible to cal- 
culate the maximum-likelihood solution in considerably less 
computing time. 

In the flat band-power parameterisation, one divides 
the observed p-range into spectral bins of width Ap. This 
corresponds to dividing the uw-plane into a set of concen- 
tric annuli in each of which the power spectrum D^ v \p) = 
p 2 C^ p '(p, vo) of each physical component in assumed con- 
stant, and its amplitude is the parameter to be determined. 
A typical annulus is illustrated by the solid circles in Fig. ^j, 
which are overlayed on the visibilities samples shown in 
Fig. [| The extent of the aperture function for a compact 
VSA observation of a single field is plotted as the solid disc 
in the figure. 

an interferometer visibility 



As discussed in section 2.2 



of a single field with 10 spectral bins, this requires about 4 
hrs of CPU time. However, we find that a computationally 
more efficient procedure is to calculate the curvature matrix 



measurement is only sensitive to the underlying sky Fourier 
modes that lie within the aperture function at that point 
in the uw-plane. Thus, only a fraction of the total number 
of measured visibilities are sensitive to the amplitude of the 
power spectrum (of either the CMB or some foreground com- 
ponent) in a particular annulus. This is illustrated in Fig. ^j, 
where only the visibilities lying in the region between the 
two dashed circles are sensitive to the level of the power 
spectrum in our typical annulus. 

Thus, provided one is calculating the log-likelihood 
function along the 'axes' in parameter space, so that only 
one parameter value is changing at any one time, the ef- 
fective size of the dataset is much smaller than the total 
number of data points. Thus, if the parameters ay-t k are 
fixed, 

\nC(v\a,k) = \nC[vk\a,k), 

where Vk is the reduced data vector consisting of only those 
visibilities sensitive to the flat band-power a^. Given the 
geometry illustrated in Fig. ^, the length of the reduced 
data vector will, in general, depend on the radius of the 
corresponding uw-annulus, the extent of the aperture func- 
tion and the positions of the measured visibilities in the uv- 
plane. For a compact VSA observation of a single field, using 
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Figure 4. An illustration of a typical annulus in the Mti-planc, 
with which the power spectrum is assumed constant (solid cir- 
cles). This is overlayed on the binned visibility samples shown 
in Fig. ^. The aperture function is plotted as a solid disc. Only 
visibilities lying in the region between the two dashed circles are 
sensitive to the level of the power spectrum in the annulus. 



Nb = 10 spectral bins (or itu-annuli), the reduced data vec- 
tor typically contains around 3-10 times fewer data points 
than the full data vector. 

The smaller size of the reduced data vector, as com- 
pared to the full data vector, leads to a significant increase in 
the speed with which the log-likelihood function can be cal- 
culated along the axes in parameter space. For the standard 
LAPACK dense matrix routines, the cost of evaluating the log- 
likelihood function scales as 0(7V|), where Nd is the length 
of the data vector. Thus, by using the reduced data vector, 
the computation time for evaluating the log-likelihood func- 
tion is reduced by a factor between 30 and 1000, depending 
on the annulus in question. 

Unfortunately, the same speed-up factor is not obtained 
for the sparse preconditioned conjugate-gradient algorithm. 
This reason for this is clear from Fig. ^. Since the reduced 
data vector contains only visibilities lying in the region be- 
tween the two dashed circles, the corresponding sparsity 
fraction /„, will increase significantly. Since the sparse ma- 
trix algorithm scales as 0{f^ 2 Nf), the log-likelihood evalu- 
ation is speed up only by a factor between 4-60, depending 
on the annulus. Nevertheless, for a compact VSA observa- 
tion of a single field, the sparse matrix approach is still about 
a factor of 3 times faster in evaluating the log-likelihood 
function than the LAPACK dense matrix algorithm, when 
timings are averaged over the 10 spectral bins. 



5.3.1 Calculation of the maximum-likelihood solution 

The speed-up outlined above is only possible when one varies 
only a single parameter at, while keeping the others fixed. 
Thus, one cannot use a general minimisation algorithm such 
as the Newton-Raphson or Powell direction-set method, 
since these perform iterations of the form 



where 5a n may be in some general direction in parameter 
space. A straightforward solution is provided, however, sim- 
ply by using a restricted version the Powell direction-set 
algorithm in which the 'conjugate' directions are forced to 
remain along the parameter axes. Starting with some initial 
guess ao for the solution, this procedure is equivalent to per- 
forming, in turn, independent line-minimisations along each 
parameter axis, while keeping the other parameters fixed at 
their initial values. One then updates the whole solution vec- 
tor and repeats the process until convergence is obtained. It 
was found that the likelihood function was sufficiently struc- 
tureless that this restricted 'Powell' algorithm still converged 
in around 3N(N + 1) functions evaluations, where N is the 
number of parameters a. 

For a compact VSA observation of a single field, with 
10 spectral bins, the maximum-likelihood power spectrum 
is obtained in about 10 mins of CPU time using the sparse 
matrix evaluation of the log-likelihood function, or about 45 
mins using standard LAPACK dense matrix routines. In both 
cases, this is cle arly faster than either of the methods dis- 
cussed in section |5.2| , and is hence our method of choice for 
determining the maximum-likelihood power spectrum from 
the current generation of CMB interferometers. The sparse 
and dense matrix versions of the algorithm are implemented 
respectively in the programs spf astlike and laf astlike in 
the MADCOW package. 



5.3.2 Estimation of errors on parameter values 

Once the maximum-likelihood solution a has been obtained, 
it still remains to calculate the errors on our estimated flat 
band-powers. One can, of course, simply calculate the cur- 
vature matri x H(a) numerically, in the manner discussed in 
section 5.2.3, which requires about 30 mins of CPU time for 



a compact VSA observation of a single field, with 10 spectral 
bins. This approach does, however, assume that the likeli- 
hood function is well-approximated by a Gaussian near its 
peak, and leads to symmetric error bars on our estimated 
parameter values. This may not, in fact, be a valid approx- 
imation, especially for poorly constrained parameters. 

An alternative approach is to make use of the fact that 
it is usual to choose the width Ap of the spectral bins to 
correspond to the characteristic width of the aperture func- 
tion in the uv-pl&ne. Let us suppose that the sky emission is 
due only to the CMB. In this case, all the flat band-power 
estimates a will be close to uncorrelated. This is equivalent 
to the eigenvectors of H(a) lying close to the directions of 
the axes in parameter space, and the off-diagonal elements 
being very small as compared with those on the leading di- 
agonal. In this case, a better representation of the errors on 
our estimates is obtained by evaluating the log-likelihood 
function along each parameter direction in turn, through 
the point a. Thus is illustrated in Fig. ^. In the ideal case, 
where the parameters (k — 1,...,N) are independent, 
the resulting curves would be the marginal distributions of 
each parameter. 

In order to obtain accurate confidence intervals on each 
parameter, it is best to calculate the log-likelihood function 
at a large number of points along each parameter direction. 
Although, using the reduced data vector approach discussed 
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Figure 5. An illustration of calculating the likelihood function 
along each parameter direction, through the peak d. 



above, it is relatively cheap to evaluate the log-likelihood 
function along each parameter direction (while keeping the 
other parameters fixed) , if one desires more than around 100 
points along each direction, a more efficient procedure is to 
rotate to the signal-to-noise eigenbasis for each parameter 
in turn, as discussed in Appendix |e| Once this rotation has 
been performed for a given parameter ah , the log-likelihood 
function can be calculated at a large number of points along 
this parameter direction at virtually no additional compu- 
tational cost. 

Rotating to the signal-to-noise eigenbasis requires the 
solution of a generalised eigenproblem of dimension equal to 
the length of the reduced data vector vt for each parame- 
ter. In the absence of a stable sparse matrix implementation 
of this algorithm, a standard LAPACK subroutine was used 
to perform the calculation. For a compact VSA observation 
of a single field, with 10 spectral bins, the entire calcula- 
tion requires about 30 mins of CPU time. This approach 
is implemented as the routine eigenlike in the MADCOW 
package. 

In fact, the routine eigenlike may be used in isolation 
to obtain both the maximum-likelihood solution d, and the 
likelihood functions along each parameter direction passing 
through the maximum. Starting with some initial guess ao 
one performs a single 'iteration' by calculating the likelihood 
functions along the parameter directions through ao. The 
peak position of the likelihood in each direction is then used 
to update the solution vector and the process is repeated 
until it converges. For observations of a single field conver- 
gence is usually achieved after only 3 iterations. Thus, for a 
compact VSA observation of a single field, with 10 spectral 
bins, the full calculation requires about 1.5 hrs of CPU time. 
This is clearly slower than using spf astlike or laf astlike, 
but provides a useful cross check of one's results, since it uses 
entirely different computational subroutines. 



5.3.3 Marginalisation over Galactic parameters 

For illustration, we assumed above that the sky emission 
was due solely to CMB fluctuations, without any contami- 
nating Galactic foreground emission. Let us now suppose in- 
stead that there is, in fact, substantial foreground emission 
coming from (say) a single Galactic component. If multi- 



frequency observations are available, one may calculate the 
maximum-likelihood solution d for the CMB and Galactic 
flat band-powers in each spectral bin, using any of the tech- 
niques outlined above. 

Suppose the parameters at and ay correspond respec- 
tively to the CMB and Galactic flat band-powers in the 
fcth spectral bin. If the bin width corresponds to the ex- 
tent of the aperture function, these two parameters will be 
quasi-uncorrelated with the remaining ones, but the param- 
eters ak and ay themselves will be strongly correlated since 
they refer to the same spectral bin. Nevertheless, the rou- 
tine eigenlike provides a natural method for marginalising 
over the flat band-power of the Galactic (or CMB) compo- 
nent in each bin. Keeping the parameters ay' (k" 7^ k and 
k" 7^ k') fixed, one can calculate the log-likelihood func- 
tion in the two-dimensional (ak, a^/)-space by performing a 
signal-to-noise eigenmode rotation at each required value of 
ay (say), and calculating the variation with respect to ak at 
almost no extra computational cost. One can then perform 
a numerical integration over at or ay, thereby obtaining a 
marginalised likelihood function for the other parameter. 



5.4 Implementation: the MADCOW package 

The Microwave Anisotropy Dataset Computational sOft- 
Ware (madcow) package consists of the 5 programs 
maxlike, errors, spfastlike, lafastlike and eigenlike, 
which are discussed above. The first three programs are 
based on the sparse matrix conjugate-gradient algorithm 
for c alcula ting the log-likelihood function, discussed in sec- 
tion 5.1.2, whereas the last two programs calculate the log- 
likelihood function using LAPACK subroutines. 

All the above programs have been parallelised using 
MPI, in order that they may be run on distributed-memory 
parallel machines. The parallelisation of the routines 
lafastlike and eigenlike makes use of the ScaLAPACK 
parallel version of the LAPACK library. The speed of both rou- 
tines is found to scale linearly with the number of processors 
up to the maximum of 8 CPUs available on the Linux Be- 
owulf cluster at our disposal. The parallelisation of the spare 
matrix programs maxlike, c, errors . c and spf astlike, c 
is less straightforward, since existing parallel versions of the 
relevant routines are not available. Some parallelisation of 
the algortihms has been performed, but the resulting pro- 
grams do not scale as effectively as those using LAPACK rou- 
tines. A method of achieving linear scaling of the sparse 
matrix algorithm is currently under investigation. A /3-test 
version of the MADCOW package may be downlo aded from 
tittp : //www.mrao . cam. ac .uk/ sof tware/madcow/. 



6 APPLICATION TO SIMULATED 
OBSERVATIONS 

As an illustration of our maximum-likelihood methods, in 
this section we apply them to simulated VSA observations. 
The VSA is a 14-element radio interferometer, which is ca- 
pable of operating at frequencies between 26 and 36 GHz. 
The VSA is currently observing at a frequency of 34.1 GHz. 
Its observing bandwidth is 1.5 GHz, and the system tem- 
perature about 25-35 K. The telescope can observe in two 
different configurations: the compact array uses corrugated 
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Figure 6. The VSA compact array in June 2001. The rectangu- 
lar area indicates the shape of the lower part of the VSA table, 
which is mounted in east-west direction and stcerable in eleva- 
tion. There are 14 horn-reflector antennas mounted at an incli- 
nation of 35 degrees. The dashed lines above the horns show the 
extension of the antenna mount (Rusholme 2001). 
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Figure 7. The VSA primary (power) beam profile (dashed line) 
at 34.1 GHz calculated by a ray-tracing program for the corru- 
gated horn-reflector antennas, compared to a Gaussian (solid line) 
of the same FWHM (4.6 degrees). 



horn-reflector antennas of 14.3 cm diameter and baselines up 
to 1.5 m, and the extended array consists of 32.2-cm horns 
with baselines up to about 3 m. We base our simulations on 
the compact array with the actual antenna arrangement as 
of June 2001, as shown in Fig. |(| The primary (power) beam 
has a FWHM of 4.6 degrees at 34.1 GHz and can be well 
approximated by a Gaussian (Fig. ^). 

We simulate VSA observations of three fields centred 
on RA(1950) = h 20 m 00.0 s , Dec. (1950) = 30° 00' 00", 
RA(1950) = h 07 m 20.6 s , Dec. (1950) = 30° 16' 30" and 
RA(1950) = h 12 m 50.0 s , Dec. (1950) = 27° 48' 00" (the 
VSA1, VSA1A and VSA1B fields respectively). Real VSA 
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Figure 8. The theoretical CDM power spectrum with parameter 
values described in the text. 



observations of these fields (and others) have been completed 
and are currently being analysed. The uu-coverage for a 5 h- 
observation of the VSA1 field was shown in Fig. |l|(c). The 
antenna arrangement yields a reasonably uniform sampling 
in the multipole region £ w 100 — 900. 

In order to simulate an observation of CMB fluctua- 
tions, we first create a 512 x 512 array of Fourier modes, the 
real and imaginary parts of which are drawn from a Gaus- 
sian distribution with a variance given by a theoretical power 
spectrum corresponding to a standard inflationary CDM 
model with a Hubble parameter Ho = 65 km s _1 Mpc -1 
and a fractional baryon density f2b = 0.05. The model is 
spatially-flat with fi m = 0.3 and Q,a = 0.7, a primordial 
scalar spectral index n — 1 and no tensor modes. The spec- 
trum is normalised such that Q rms = 18pK. The theoretical 
Ci spectrum is plotted in Fig. || and the actual realisation 
of the CMB Fourier modes were plotted in Fig. |l|(a). 

These Fourier modes are inverse Fourier-transformed to 
obtain the corresponding CMB anisotropies on the sky. This 
array is then multiplied by the telescope primary beam and 
Fourier transformed to give a regular array in the visibility 
plane (see Fig. |l|(b)). A VSA observation is then simulated 
by sampling the regular array at the required points in the 
itw-plane using a bilinear interpolation. The ttu-positions at 
which the visibilities are sampled correspond to a 64-s inte- 
gration per visibility (see Fig. |l](c) ) . 

Instrumental noise on the data is simulated by adding to 
each visibility a random complex number for which the real 
and imaginary parts are drawn from a Gaussian distribution 
with an rms of 3.5 Jy. Since the measured VSA noise level 
of about 240 Jy/y^s/baseline, this corresponds to ~ 70 days 
of 5-h observations. This is the time typically spent on each 
VSA field during the first year of VSA observations between 
September 2000 and August 2001. 

These simulated VSA data typically contain w 25 000 
64-s visibility samples per day of observation. These visibil- 
ities are binned as discussed in section with a bin size of 
Au = 3 wavelengths. It was checked that gridding with cell 
sizes An = 1 — 6 did not significantly affect the final derived 
CMB power spectrum. The binned data are then analysed 
using the maximum-likelihood techniques described in sec- 
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Figure 9. The maximum-likelihood CMB power spectrum de- 
rived from simulated observation of the VSA1 field. The flat band- 
powers were estimated in 10 spectral bins. The greyscale bars 
show the likelihood function evaluated along the parameter di- 
rections through the maximum-likelihood solution a. 



tion q. The results presented below were obtained using the 
program eigenlike in the MADCOW package. 

We note that, for illustration, in these simulations we 
have restricted ourselves to the case in which the sky emis- 
sion is due solely to CMB fluctuations, without any contami- 
nating foreground emission. Neglecting foregrounds provides 
a convenient way of illustrating the general method, since 
only one observing frequency is required, and one need not 
include the flat band-powers of the Galactic power spectrum 
as parameters in our analysis. Also, in practice, at the ob- 
serving frequency of the VSA Galactic contribution can be 
largely ignored, because the emission coming from Galactic 
dust and free- free radiation is minimal. Moreover, VSA fields 
are chosen to lie in regions of low Galactic emission. Never- 
theless, as presented in section ^, the maximum-likelihood 
methods implemented in the MADCOW package can easily 
accommodate the presence of diffuse Galactic foregrounds. 
Indeed, when multifrequency data are available, one may 
obtain estimates of the power spectra of both the CMB and 
the Galactic foreground emission. 

6.1 Single field observations 

We first consider the case where the power spectrum is es- 
timated from only the simulated observations of the single 
field VSA1. The flat band-powers have been estimated in 10 
spectral bins of width Al « 90 between £ « 80 and £ w 950. 
The resulting likelihood function evaluated along the param- 
eter directions through the maximum point & are plotted as 
the greyscale bands in Fig. |^. 

Dark areas indicate high likelihoods. The extension of 
the shaded areas reflects the uncertainty in the estimated pa- 
rameters. As expected, the uncertainty is larger in the poorly 
sampled lower and higher £-bins, where the telescope sensi- 
tivity is low. On the other hand, the fairly well sampled mul- 
tipoles around I ~ 500 can be determined more accurately. 
All reconstructed band powers are consistent with the the- 
oretical input power spectrum. The bin width was chosen 



i 

Figure 10. The maximum-likelihood CMB power spectrum de- 
rived from simulated observation of the three mosaiced fields 
VSA1, VSA1A and VSA1B. 



to be equal to the 1/e-diameter of the Fourier transform of 
the primary beam. Thus, adjacent bins are only mildly cor- 
related. Indeed, on calculating the curvature matrix H(a) 
one finds that adjacent bins are anti-correlated at only the 
15 per cent level. 

On repeating the likelihood analysis for different real- 
isations of the simulated noise, it is found that results are 
always consistent with the input spectrum and with one an- 
other. It was further verified that approximating the primary 
beam by a Gaussian of the same FWHM does not distort 
the resulting power spectrum. Moreover, the results were 
unaffected by ignoring the finite bandwidth of the receivers 
and assuming a single observing frequency in the centre of 
the band. 



6.2 Mosaicing 

We now turn to the simultaneous analysis of observations 
of the three fields VSA1, VSA1A and VSA1B. Since these 
fields overlap on the sky, the corresponding visibilities are 
thus correlated. The resulting likelihood function evaluated 
along the parameter directions through the maximum point 
& are plotted as the greyscale bands in Fig. jjij. 

Although mosaicing is often used to improve the in- 
strumental resolution in ^-space, we have chosen to compute 
the power spectrum in the same bins as for the single field 
reconstruction, in order to facilitate a direct comparison. 
Moreover, owing to the relatively small size of the compact 
VSA horns, the the aperture function is already sufficiently 
narrow to provide the ^-resolution necessary to resolve the 
expected acoustic peak structure in the CMB power spec- 
trum. As expected, the uncertainty on the flat band-powers 
has been significantly reduced, particularly in the poorly 
sampled multipole regions. Moreover, the anti-correlation 
between adjacent bins has reduced to around the 10 per 
cent level. 
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7 CONCLUSIONS 

In this paper, we have investigated maximum-likelihood 
methods for estimating the power spectrum of the cosmic 
microwave background (CMB) from interferometer observa- 
tions. In particular, we have considered the computational 
efficiency of several techniques for obtaining flat-band power 
estimates in a number of spectral bins, together with con- 
fidence limits on the power in each bin. For multifrequency 
data, one may also estimate the flat band-powers of Galac- 
tic foreground emission in each spectral bin. The methods 
developed may be applied to single-field or mosaiced obser- 
vations, and take proper account of non-coplanar baselines. 

The sparse nature of the covariance matrix for visibil- 
ity data allows the use of sparse matrix techniques which 
significantly reduce the computational burden of evaluating 
the likelihood function, as compared with standard dense 
matrix routines. This enables the maximum-likelihood solu- 
tion to be obtained more efficiently by numerically maximis- 
ing the likelihood using only function values, as opposed to 
more traditional techniques, such as the Newton-Raphson 
algorithm, which rely on gradient and curvature informa- 
tion. We also find the covariance matrix of the errors on the 
parameters is most efficiently calculated using a numerical 
second-differencing approach, rather than direct calculation 
of the analytic expression for the curvature matrix. 

The speed with which the likelihood function is calcu- 
lated may be further increased by making use of the fact 
that only a small fraction of the total number of observed 
visibilities are sensitive to the flat band-power in any one 
spectral bin. Using this reduced data-set for each parame- 
ter enables very fast evaluation of the likelihood function 
along the 'axes' in parameter space. Indeed, we find that 
performing independent line-maximisations of the likelihood 
function along these parameter directions, and iterating un- 
til convergence, provides the computationally most efficient 
way of obtaining the maximum-likelihood solution. 

If the spectral bins are chosen to be sufficiently wide 
that the flat band-powers are quasi-uncorrelated parame- 
ters, one may dispense with the Gaussian approximation 
to the likelihood function near its peak, and obtain (gen- 
erally asymmetric) confidence intervals on each parameter 
by calculating the likelihood function along each parame- 
ter direction through the maximum likelihood point. This 
is best achieved by performing a single-to-noise eigenmode 
rotation for each parameter. Indeed, this method can itself 
be used to arrive at the maximum-likelihood solution. This 
is illustrated by application to simulated observations by the 
Very Small Array in its 'compact' configuration. If multifre- 
quency data are available, this technique may also be used to 
marginalise over the flat band-power of foreground Galactic 
emission in each spectral bin. 
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APPENDIX A: THE GENERAL FORM OF THE VISIBILITIES COVARIANCE MATRIX 

The contribution S(u, v) of the sky signal to the measured visibility V(u, v) is given simply by the convolution of the underlying 
Fourier modes a(u, u) with the aperture function A(u, v), which is itself the Fourier transform of the primary beam A(x, v) 
of a single antenna. Assuming that the jth visibility is measured at the position Uj in the uw-plane and at a frequency Vj, we 
may therefore write 

Sj = S{uj,Vj) = J J A 2 uA{uj - u,Vj)a(u,Vj). (Al) 

If we write the functions A(u, v) and a(u, v) in terms of their real and imaginary parts, 
A(u, v) — Ar(u, v) + iAi(u, v), a(u, v) = a R (u, v) + iai(u, v), 

we quickly find that the real and imaginary parts of Sj are given by 

Sf 1 = J J d 2 u [A R (uj - u, u^a^u, Uj) + Ai(Uj - u, u 3 )a R (u, i/,)] . (A2) 

We note that if the antenna function A{x,v) is real, then Sj fl ' depends only on the real parts a R {u,u) of the underlying 
Fourier modes of the sky, and similarly depends only on ai(u,u). If the primary beam A(x, v) is real and even, then 
A(x, u) is also real and even. 

The elements of the constituent submatrices of the signal covariance matrix S, introduced in equation (^), are most 
conveniently written as 

s (a,R) _ {S W S W } = 1^5;)) + Sti&Si))], 
S^= (S^S?) = i[K((5 l 5;))-5f«5 l 5 J ))], 

s (r,d^ {s (vsp) = |[-3f(( < s i< s;}) + »((«s i «s f »] J 

S^ R) = (SPsp) = |[3f((<S i <S;)) + »((«S i «S i ))] J (A3) 

where K and 9 denote the real and imaginary parts of the, in general, complex quantities {SiS*} and (SiSj), which are given 
by 



{SiS*} = J J d u A(ui — u,Vi) J J & u A*(uj — u ,Vj)(a(u,Vi)a*{u ,Vj)) 
(SiSj), = / / d 2 u A(ui — u,Vi) d 2 u' A(uj — u' ,Vj){a(u,Vi)a(u' 



(R R) 

By definition, the matrix {SiSj} is Hermitian and {SiSj} is symmetric. Similarly, each of the constituent submatrices Sh ' 
and Slj' 1 ^ is, by definition symmetric. It is also straightforward to show that S^' 1 ' = 5 , j[' fl ', so that, as expected, the full 
covariance matrix S is symmetric. Moreover, from ( |A2|) , we see that, if the aperture function A(u, v} is real, and the real and 
imaginary parts of the underlying Fourier modes a(u, u) are uncorrelated, the elements of the off-diagonal submatrices Sjj 
and S'iJ are all zero. Thus, in this case, the full covariance S becomes block diagonal. 
If the sky emission is due to N p physical components, then 



p=l 



a(u, v) = \ a (u,u). 



Moreover, if we assume the spectral index of each component does not vary across the observed field, then we can describe 
the frequency variation of each component through the functions f p (u) (j> — 1,2, ... , N p ). In this paper, we take the reference 
frequency va = 30 GHz and normalise the frequency dependencies so that f p (vo) = 1 for all physical components. If the 
emission from each physical component is statistically isotropic over the observed field and no correlations exist between the 
components, and using that fact that the sky is real, the Fourier modes have mean zero and their covariance properties are 
given by 

<a (p) ( U ^)[a (P VV)n = C M (p,v )f p (v)f p ,(v')S pp ,S(u - u'), 

(aW(u,v)a (p '\u',v')) = C ( - p) (p,u )f p (u)f p ,{ 1 y')S pp ,S(u + u'), (A4) 
where p = \u\ and (p, uq) is the ensemble-average power spectrum of the pth physical compon ent at the reference frequency 



vq. If correlations are known a priori to exist between physical components, then the expressions ( A4) can be straightforwardly 



generalised to include them. Assuming no correlations exist, however, we find that the covariance structure of the Fourier 
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modes of the total sky emission is given by 

(SiS*) = / / A{ui-u,Vi)A*{uj -u,Vj)£(\u\,Vi,Vj) d 2 u, 



A(v,i — u, Ui)A(uj + u, Vj)^(\u\, Ui, Vj) d 2 it, 



(A5) 
(A6) 



where we have introduced the generalised power spectrum £(p, Vi, Vj), which is defined by 



Vt) 



(A7) 



p =i 



We note, in particular, that if the aperture function satisfies A*(u, v) = A{— u, v), which corresponds to a real primary beam 
A{x, v), then 



{S(ui,Vi)Sj{uj,Vj)) = (S(in,Vi)Sj(-Uj,Vj)). 



(A8) 



Thus, in this case, one need only evaluate (SiSj) and use the result ( A8) in order to construct the full signal covariance matrix 
S, whose elements are defined in (A3). 



If we introduce polar coordinates (p, 9) into the un-plane, we can write (A5) and (A6) formally as the one-dimensional 
integrals 



{SiSi) = I d PP wV(p)i(p,i>i,»A 

Jo 

where the window functions Wy (p) and Wy (p) are defined respectively by 
Wii\p) = I dOAiui-u^i^iuj-u^j), 



d9 A(ui — u, Ui)A(uj + u, Vj), 



(A9) 

(A10) 
(All) 



and u = p(cos#, sin©)*. In the case where A*(u,u) = A{—u,v), one need only evaluate W-j (fl) and use (A8) and (A3) to 
construct S. 



APPENDIX B: MOSAICING 

For mosaiced observations, the pointing centres of individual observations are not identical with the global phase-tracking 
centre. This can be accommodated by shifting the primary beam response pattern A(x,v) to the corresponding pointing 
centre. Thus, let us assume that the jth visibility is measured at the position Uj in the wv-plane, at an observing frequency 
Vj, and is referred to the pointing centre Xj. In this case, the expression corresponding to (|3|) becomes 



e — <?r \ I / Al \^-T(x,Vj) 2itiUi-x i2 

Sj=o[Uj,Vj) — A(x — Xj,Vj) ) — e 3 dx 



A(uj — u, Vj)e 



T 



2ivi(uj —u)-x , 



a(u, Uj) d u, 



(Bl) 



where the second equality follows from the Fourier shift theorem. This illustrates that the effect of mosaicing is simply to 
introduce a phase shift, so that the effective aperture function for the jth field becomes 



A e «(u,Vj) =A(u,Vj) e 2ltlu - x i ; 



(B2) 

which is, in general, complex. The covariance properties of the visibilities may then be obtained straightforwardly from the 
expressions given in Appendi x ^| b y replacing A(u, v) with A e s(u, v) given in (Bi ). In this case, however, the corresponding 
window functions (lAldl) and (All) will also be functions of Xi and Xj. In particular, we find 



t-'A / d9A{u l -u,v t )A"'{u ] -u,v :l )e 2lT ™ <x i-^ ) , 



ti/(2)/\ 2ivi(u i Xi +« a x i) I i/i a / \ a / i \ 2iriu - (x a —xa) 

W ii (P) = e I d9 A{Ui - u,Vi)A(uj + u,Vj)e ^ 3 tJ . 



(B3) 



We note that, if the primary beam A(x,v) is real, then the effective aperture function (B2) still satisfies A% s (u,v) = 



Aeff(— u, v). Thus, (|A8|) remains valid and one need only evaluate (SiSj) (and hence W-f (p)) to construct S 
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Although the above description of mosaicing is straightforward, additional subtleties exist in practice, since interferometers 
such as the VSA do not refer the phases of all visibilities to the same constant phase-tracking centre. Instead, the phases of 
visibilities are referred to the pointing centre of the corresponding field. In this case, the visibilities are given by 



Sj=S{uj 1 v j ) = 



A{x,Vj) - J ' e i d x 



T 



J d x 



A(uj - u,Vj)a{u,Uj)t 



3 d u. 



(B4) 



We note that (Bl) and (B4) differ only by a phase shift exp[— 2niuj ■ Xj], However, the vectors Uj are given in different 
coordinate systems, referring in each case to the corresponding phase-tracking centre. From (B4), we see that we may obtain 
the covariance properties of the corresponding visibilities from the equations given in Appendix H provided we replace A(u, Uj) 
by the new effective aperture function for the jth field, which is given by 

Aes(u, Uj) = A(u, Uj) e »«*l«-"i)-t t (B5) 
and similarly for the i term. Hence, in this case, the window functions are given by 

Jo 

WjHp) = / deI(tt j -u, ! / i )A(u j +u, I / j )e 2 " u ' !,> '- a!i) , 



(B6) 



We note that, for a real primary beam, the relation (A8) again holds, so that only (SiSj) (and hence (p)) need be 
calculated in order to construct S. 



APPENDIX C: LARGE FIELDS AND NON-COPLANAR BASELINES 

In Appendices [X] an d | b| we have assumed that the field of view of the telescope is small enough that the tu-distortions 
discussed in section |2.4] can be neglected. Without the assumption of a small field size or vanishing w-components of the 
baselines, the contribution of the sky signal to a measured visibility is given by (e.g. Thompson et al. 1994) 

5 = %,,^,^)= [fd'xAix^^x^e^r^iV^-m^^^ (C1) 

where Wj is the component of the baseline in the direction of the phase-tracking centre. For convenience, we assume that 
pointing and phase-tracking centres are identical. The factor -^/l — \x\ 2 — 1 ~ — i | a; | 2 is small (for example, ^\x\ 2 < 0.001 

at an angle of x — 2.3°). We can thus either omit the factor \x\ 2 ~ 1 or absorb it into the definition of the primary 

beam A(x,v). We obtain 

= JJd 2 xA{x,v J )e-^^ 2 ^f{x,v ] )e 2 ^y\ (C2) 
Defining an effective beam 

A e ff (x, w,Vj) = A(x, Uj) exp(— 7riu;|a;| 2 ), (C3) 
the visibilities are 

Sj — S(uj,Wj,Uj) — JJd 2 uA c ff(uj—u,vjj,Vj)a(u,Uj), (C4) 
where the Fourier transform of A e a is given by a convolution 

A c s(u, w, v) = / / d 2 u A(u — u' , v) — e 1 ^'" l w , (C5) 
J J lw 

The covariance structure of the visibilities can then by found by using this expression in place of A(u,u) in the results of 
Appendix [a|. In particular, we note that if A c s(ui + u, Wi, ui) = A* s ((— Ui) — u, —Wi, Ui), we find 

(S(ui,Wi,Ui)S(uj,Wj,Uj)) = {S(ui,Wi,Ui)S*(-Uj,-Wj,Uj)}. (C6) 



Thus, in this case, the full signal covariance S can be obtained from (A5) and ([A3]). 
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APPENDIX D: GAUSSIAN PRIMARY BEAM 

Let us denote the primary beam at the ith observing frequency i>i by A(x,Vi) = cxp(— |cc| 2 /2af), where <Ji is the dispersion 
of the primary beam at this frequency. The aperture function is simply the Fourier transform of the primary beam, and is 
given by A(u, i/j) = 27rof exp(— 2n 2 af \u\ 2 ). As we now show, in this case it is possible to obtain analytical formulae for the 
window functions W^\p) and W^ 2 \p). 



Dl Two-dimensional approximation 



We begin by considering the general case of mosaiced observations, but in the approximation that the visibilities can be 
obtained by a two-dimensional Fourier transformation of the sky brightness distribution and the primary beam. In that case, 
from (BC), we have 



W- j (p) — (2irai<Tj) z J J A 2 u exp( — 2n' ! af\ui — uj 2 ) exp(— 2iY 2 G 2 \uj — u\ z ) exp(27riit ■ (xj — Xi)), 



(Dl) 



where all visibilities are referred to the pointing centre of the corresponding field as in (B6). If we introduce the vectors 



q = o 2 Ui + (TjUj, x = Xj — Xi 



(D2) 



and use 9 to denote the angle between q and the position vector u in the Fourier plane, and <f) the angle between x and q, we 
find that the window function can be written as 

Wlj(p) = (2-KOiOj) 2 exp[— 2n 2 (cr 2 \ui\ 2 + a 2 \uj\ 2 )] exp[— 2-K 1 (a1 + a 2 )p 2 ] I A9 exp[4-7r 2 |q|pcos 9] exp[27rz|a;|pcos((? + (/>)]. 

Jo 

The integral on 9 can be written as 

Jij(p) — / d# exp[47r 2 |q|pcos#] exp[27ri|a;|p(cos0cos0 — sin#sin0)] = / d$ exp[pcos 9} exp[iacos 9 + ibsinO], (D3) 
Jo Jo 

where p — An 2 \q\p, a — 27r|a;|pcos0 and b = — 27r|a;|psin <f) (compare Gradshteyn & Ryzhik, integrals 3.937). Substituting 
z = exp(i8), Az = iz d9, cos 9 — (z + z~ 1 )/2 and sin 8 = (z — z^ 1 )/(2i), we find 



Jij (p) 



Az — exp 



(p + id 



z + z 



ib- 



2 1 



Az — exp 



(p + 6 + ia)| + (p- b + ia)^j- 



where the curve C is the unit circle in the complex plane. Defining f(z) = z 1 exp[Az + Bz 1 ] , where A — (p + b + ia)/2 and 
B = (p — b + ia)/2, we obtain 

Jij(p) = <fAz-f(z) = 27rRes(/,0), 
Jc 1 

where Res(/, 0) denotes the residue of f(z) around the essential singularity z = 0. The Laurent series of / about this point 
has the form f(z) — 5Z°° = _ c m(z — 0) m and is given by 



tt \ 1 r a o -ii 1 (Az + Bz 
f(z) = -eMAz + Bz ] = -^ K - ~r 

m = Q 

oo m 

-VV —. m C n A n 

z ml 



oo m 



z * — * * — ' ml 

m — 7?.=0 



^-^ ^-^ num. - 



n\{m — n)\ 



A B z , 



from which we can read off 



, . ^{ABT ^{\VMBf 

Re S (/,0)=c_ 1 =^ ±>=J2- 

n=0 n=0 



(n!) 2 



Jo (VIZI), 



where Io(z) = ^Z^Lod I ( n ') 2 16 ^ ne modified Bessel function of the first kind and order zero for a complex argument z. 
Substituting all the abbreviations, we have 



4AB =p 2 ~a 2 ~b 2 + 2iap = [(27rp) 2 [(27r|q|) 2 - \x\ 2 + i(4nq ■ x) 



Thus, the window function is given by the (complex) analytical expression 
w jP{p) = (27r) 3 (cricrj) 2 exp[-27r 2 (a 2 |u,| 2 + cr||itj | 2 )] exp[-27r 2 (CT 2 + cr 2 )p 2 ]7o(27rVfcp), 



% 3 

where 

k = (27r|q|) 2 — \x\ 2 + i(4nq ■ x). 
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The full signal covariance matri x S can then be derived using (A9) and (A£). 

A useful check of our result (D5) is provided by the special case of a single field, for which Xi = Xj, and so x — Xj —Xi — 0. 
Thus, Jij (p) is given by the real integral 



dB exp[47r 2 |q|pcos 9] = / d6 exp[47r 2 |q|psin#] = 2nIo(4:iT 2 \q\p). 



Jij(p) 



Therefore, in this case, the elements of the window function are given by the real expression 

W ii (p) = (2 7r ) 3 ( a ~i< J j) 2 exp[-27r 2 (o- 2 |iti| 2 + cr 2 |uj| 2 )] exp[-27r 2 (cr 2 + a 2 )p 2 ]I (4n 2 \a 2 Ui + a 2 Uj\p), 



which is identical to setting x = in (D6). Comparison with (A3) shows that the offdiagonal blocks 5, 
signal covariance matrix vanish and real and imaginary parts of the visibilities can be treated independently. 



and of the 



D2 Non-coplanar baselines and the w-correction 



We now consider the effects of non- vanishing ^-components on our expression (D5) for the window function Wy . From (C3), 
for a Gaussian primary beam of width a, the effective primary beam is given by A e ff(x, w, v) = exp(— \x\ 2 (^pz +Z7rw)). On 
Fourier transforming, we obtain the effective aperture function 

1 — id) -2tt 2 o- 2 1_i t I7/.I 2 . . o. _^„2-2 



A e s(u, w, v) = 2na 



l + ( 



10 -2n'a* 

-e !+* 



-I«r 



= 2tt<7 (1 



2n 2 a 2 {l-i(f>)\-u\' ! 



Dl 



where d> = 2no 2 w and a 2 = a 2 /(l + d> 2 ) = cr 2 /(l + \-k 2 o^w 2 ). 

We see that the effect of the w-distortion can be considered as turning the primary beam into a complex Gaussian, wh ose 
Fourier transform is again a complex Gaussian. This can easily be taken into account in the analysis from section 
replacing 

t — t = a (1 — up) 



by 



1 + 



in (|Dl|). For the angular integration (D3), we then have to use the vectors 
q = a 2 Ui + o 2 Uj, x = Xj — Xi + 2n (4>j0 2 Uj — d)icr 2 Ui} 

instead of (D2). The final expression for the window function W^j (p) is 

3 (a l a J ) 2 (l + I j +i(d: 3 -fa)) e -2- 2 (*, 2 l«.l 2 +^ 2 l« 3 l 2 ) g^t^V^-^V,! 2 ) 

2 I |2 



W^(p) = (2^(9^(1 + 4)^ + ^ 



(D7) 



where now k = (27r|q|) — \x\ + i(47rq ■ x) as in 



4>i = 2-KG 2 Wi. Furthermore, we note that, A c n(uj_+ u, Wj, Vj] = A* s ((— Ui) — u, 
full signal covariance S can be obtained from (G6) and (AJ). 



but with q and x defined in (D7), and a* — + 0i) with 



and so (C6) is satisfied. Thus, the 



APPENDIX E: SIGNAL-TO-NOISE EIGENMODES 



As discussed in section 5.3.2, it is useful to calculate the log- likelihood function along each coordinate direction in parameter 



space. In other words, one wishes to calculate the log-likelihood as a function of a sin gle p arameter a^, while the remaining 



parameters are kept fixed; we denote this by ln£.(v\a,k). In fact, as shown in section 5.3, for interferometer data one may 
replace the full data vector, v, with a reduced data vector that contains only those visibilities sensitive to changes in the 
flat band-power a^. Hence, for various values of at, one must evaluate 

ln£(vk\a,k) = constant — | [in |C(a)| + i>fcC -1 (a)t)fc] . 

This calculation is neatly performed by using the method of signal-to-noise eigenmodes, which are a special case of Karhunen- 
Loeve modes, where the parameter dependence is linear (affine) (Bond 1995; Bond, Jaffe & Knox 1998; Tegmark, Taylor & 
Heavens 1997). 

The covariance matrix C(a) = S(a) + N contains contributions from the sky signal and the noise. Since the parameters 
are flat band-powers in spectral bins, we can write the signal contribution as 

S(a) = Si(a k ) + S 2 (a), 

where a denotes collectively the fixed parameters ay (k' 7^ k). Hence, Si depends only on the parameter of interest and 
depends only on the (fixed) values of the other parameters, and is hence a fixed matrix. Moreover, the matrix Si is linear in 
ak, since it is simply a flat band-power, and so we may write Si(ak) = a-kU , where the fixed matrix U = Si(l). Thus the 
total covariance matrix can be written as 

C(a) = a k U + S 2 {a) +N = a k U + V, 
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where U is the fixed 'unit signal' covariance matrix and V is the fixed 'generalised noise' matrix. 

Since U and V are both real symmetric matrices, they can be diagonalised simultaneously by a single similarity trans- 
formation. This is most easily achieved by solving the generalised eigenproblem 

Ux = XVx. 

Let us denote the corresponding eigenvalues by and the eigenvectors by e it which are normalised such that e\Vei = 1. 
If we now define a new set of 'rotated' data by Q = ej • Vk, then it is straightforward to show that these new variables are 
uncorrelated for any value of the parameter a,k, and have a covariance matrix with elements 

(CiCi) = (l + akXJSij. (El) 

The called the signal-to-noise eigenmodes, and each is a linear combination of the original data items. Modes corresponding 
to a large eigenvalue are expected to be well-determined, while modes with small eigenvalues are poorly-determined and do 
not contribute significantly to the value of the likelihood function. 

From (El), we see that the log- likelihood, as a function of the parameter at (with the others held fixed) is given simply 

by 



m£(£|a fe ) = constant — | 



m(l + a fe Ai)+ " 



1 + a,k\i 



This is trivial to evaluate and so the log-likelihood function can be calculated at a large number of points along in a^-direction 
at virtually no extra computational cost. 

This paper has been produced using the Royal Astronomical Society /Blackwell Science MgX style file. 
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